Overview

Dataset statistics

Number of variables10
Number of observations20640
Missing cells207
Missing cells (%)0.1%
Duplicate rows0
Duplicate rows (%)0.0%
Total size in memory1.6 MiB
Average record size in memory80.0 B

Variable types

Numeric9
Categorical1

Warnings

longitude is highly correlated with latitudeHigh correlation
latitude is highly correlated with longitudeHigh correlation
total_rooms is highly correlated with total_bedrooms and 2 other fieldsHigh correlation
total_bedrooms is highly correlated with total_rooms and 2 other fieldsHigh correlation
population is highly correlated with total_rooms and 2 other fieldsHigh correlation
households is highly correlated with total_rooms and 2 other fieldsHigh correlation
median_income is highly correlated with median_house_valueHigh correlation
median_house_value is highly correlated with median_incomeHigh correlation
longitude is highly correlated with latitudeHigh correlation
latitude is highly correlated with longitudeHigh correlation
total_rooms is highly correlated with total_bedrooms and 2 other fieldsHigh correlation
total_bedrooms is highly correlated with total_rooms and 2 other fieldsHigh correlation
population is highly correlated with total_rooms and 2 other fieldsHigh correlation
households is highly correlated with total_rooms and 2 other fieldsHigh correlation
median_income is highly correlated with median_house_valueHigh correlation
median_house_value is highly correlated with median_incomeHigh correlation
longitude is highly correlated with latitudeHigh correlation
latitude is highly correlated with longitudeHigh correlation
total_rooms is highly correlated with total_bedrooms and 2 other fieldsHigh correlation
total_bedrooms is highly correlated with total_rooms and 2 other fieldsHigh correlation
population is highly correlated with total_rooms and 2 other fieldsHigh correlation
households is highly correlated with total_rooms and 2 other fieldsHigh correlation
population is highly correlated with total_rooms and 2 other fieldsHigh correlation
total_rooms is highly correlated with population and 2 other fieldsHigh correlation
median_income is highly correlated with median_house_valueHigh correlation
ocean_proximity is highly correlated with median_house_value and 2 other fieldsHigh correlation
total_bedrooms is highly correlated with population and 2 other fieldsHigh correlation
median_house_value is highly correlated with median_income and 3 other fieldsHigh correlation
longitude is highly correlated with ocean_proximity and 2 other fieldsHigh correlation
latitude is highly correlated with ocean_proximity and 2 other fieldsHigh correlation
households is highly correlated with population and 2 other fieldsHigh correlation
total_bedrooms has 207 (1.0%) missing values Missing

Reproduction

Analysis started2021-06-15 07:59:22.945151
Analysis finished2021-06-15 07:59:36.374782
Duration13.43 seconds
Software versionpandas-profiling v3.0.0
Download configurationconfig.json

Variables

longitude
Real number (ℝ)

HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION

Distinct844
Distinct (%)4.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean-119.5697045
Minimum-124.35
Maximum-114.31
Zeros0
Zeros (%)0.0%
Negative20640
Negative (%)100.0%
Memory size161.4 KiB
2021-06-15T13:29:36.501191image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/

Quantile statistics

Minimum-124.35
5-th percentile-122.47
Q1-121.8
median-118.49
Q3-118.01
95-th percentile-117.08
Maximum-114.31
Range10.04
Interquartile range (IQR)3.79

Descriptive statistics

Standard deviation2.003531724
Coefficient of variation (CV)-0.01675618195
Kurtosis-1.330152366
Mean-119.5697045
Median Absolute Deviation (MAD)1.28
Skewness-0.297801208
Sum-2467918.7
Variance4.014139367
MonotonicityNot monotonic
2021-06-15T13:29:36.639221image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
-118.31162
 
0.8%
-118.3160
 
0.8%
-118.29148
 
0.7%
-118.27144
 
0.7%
-118.32142
 
0.7%
-118.28141
 
0.7%
-118.35140
 
0.7%
-118.36138
 
0.7%
-118.19135
 
0.7%
-118.25128
 
0.6%
Other values (834)19202
93.0%
ValueCountFrequency (%)
-124.351
 
< 0.1%
-124.32
 
< 0.1%
-124.271
 
< 0.1%
-124.261
 
< 0.1%
-124.251
 
< 0.1%
-124.233
< 0.1%
-124.221
 
< 0.1%
-124.213
< 0.1%
-124.194
< 0.1%
-124.186
< 0.1%
ValueCountFrequency (%)
-114.311
 
< 0.1%
-114.471
 
< 0.1%
-114.491
 
< 0.1%
-114.551
 
< 0.1%
-114.561
 
< 0.1%
-114.573
< 0.1%
-114.582
< 0.1%
-114.592
< 0.1%
-114.63
< 0.1%
-114.613
< 0.1%

latitude
Real number (ℝ≥0)

HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION

Distinct862
Distinct (%)4.2%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean35.63186143
Minimum32.54
Maximum41.95
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size161.4 KiB
2021-06-15T13:29:36.774251image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/

Quantile statistics

Minimum32.54
5-th percentile32.82
Q133.93
median34.26
Q337.71
95-th percentile38.96
Maximum41.95
Range9.41
Interquartile range (IQR)3.78

Descriptive statistics

Standard deviation2.135952397
Coefficient of variation (CV)0.05994501302
Kurtosis-1.117759781
Mean35.63186143
Median Absolute Deviation (MAD)1.23
Skewness0.4659530037
Sum735441.62
Variance4.562292644
MonotonicityNot monotonic
2021-06-15T13:29:36.908834image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
34.06244
 
1.2%
34.05236
 
1.1%
34.08234
 
1.1%
34.07231
 
1.1%
34.04221
 
1.1%
34.09212
 
1.0%
34.02208
 
1.0%
34.1203
 
1.0%
34.03193
 
0.9%
33.93181
 
0.9%
Other values (852)18477
89.5%
ValueCountFrequency (%)
32.541
 
< 0.1%
32.553
 
< 0.1%
32.5610
 
< 0.1%
32.5718
0.1%
32.5826
0.1%
32.5911
0.1%
32.69
 
< 0.1%
32.6114
0.1%
32.6213
0.1%
32.6318
0.1%
ValueCountFrequency (%)
41.952
< 0.1%
41.921
 
< 0.1%
41.881
 
< 0.1%
41.863
< 0.1%
41.841
 
< 0.1%
41.821
 
< 0.1%
41.812
< 0.1%
41.83
< 0.1%
41.791
 
< 0.1%
41.783
< 0.1%

housing_median_age
Real number (ℝ≥0)

Distinct52
Distinct (%)0.3%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean28.63948643
Minimum1
Maximum52
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size161.4 KiB
2021-06-15T13:29:37.037678image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/

Quantile statistics

Minimum1
5-th percentile8
Q118
median29
Q337
95-th percentile52
Maximum52
Range51
Interquartile range (IQR)19

Descriptive statistics

Standard deviation12.58555761
Coefficient of variation (CV)0.4394477408
Kurtosis-0.8006288536
Mean28.63948643
Median Absolute Deviation (MAD)10
Skewness0.0603306376
Sum591119
Variance158.3962604
MonotonicityNot monotonic
2021-06-15T13:29:37.177745image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
521273
 
6.2%
36862
 
4.2%
35824
 
4.0%
16771
 
3.7%
17698
 
3.4%
34689
 
3.3%
26619
 
3.0%
33615
 
3.0%
18570
 
2.8%
25566
 
2.7%
Other values (42)13153
63.7%
ValueCountFrequency (%)
14
 
< 0.1%
258
 
0.3%
362
 
0.3%
4191
0.9%
5244
1.2%
6160
0.8%
7175
0.8%
8206
1.0%
9205
1.0%
10264
1.3%
ValueCountFrequency (%)
521273
6.2%
5148
 
0.2%
50136
 
0.7%
49134
 
0.6%
48177
 
0.9%
47198
 
1.0%
46245
 
1.2%
45294
 
1.4%
44356
 
1.7%
43353
 
1.7%

total_rooms
Real number (ℝ≥0)

HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION

Distinct5926
Distinct (%)28.7%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean2635.763081
Minimum2
Maximum39320
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size161.4 KiB
2021-06-15T13:29:37.309996image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/

Quantile statistics

Minimum2
5-th percentile620.95
Q11447.75
median2127
Q33148
95-th percentile6213.2
Maximum39320
Range39318
Interquartile range (IQR)1700.25

Descriptive statistics

Standard deviation2181.615252
Coefficient of variation (CV)0.8276977802
Kurtosis32.630927
Mean2635.763081
Median Absolute Deviation (MAD)797
Skewness4.147343451
Sum54402150
Variance4759445.106
MonotonicityNot monotonic
2021-06-15T13:29:37.448027image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
152718
 
0.1%
158217
 
0.1%
161317
 
0.1%
212716
 
0.1%
205315
 
0.1%
170315
 
0.1%
147115
 
0.1%
171715
 
0.1%
160715
 
0.1%
172215
 
0.1%
Other values (5916)20482
99.2%
ValueCountFrequency (%)
21
 
< 0.1%
61
 
< 0.1%
81
 
< 0.1%
111
 
< 0.1%
121
 
< 0.1%
152
< 0.1%
161
 
< 0.1%
184
< 0.1%
192
< 0.1%
202
< 0.1%
ValueCountFrequency (%)
393201
< 0.1%
379371
< 0.1%
326271
< 0.1%
320541
< 0.1%
304501
< 0.1%
304051
< 0.1%
304011
< 0.1%
282581
< 0.1%
278701
< 0.1%
277001
< 0.1%

total_bedrooms
Real number (ℝ≥0)

HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION
MISSING

Distinct1923
Distinct (%)9.4%
Missing207
Missing (%)1.0%
Infinite0
Infinite (%)0.0%
Mean537.8705525
Minimum1
Maximum6445
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size161.4 KiB
2021-06-15T13:29:37.582057image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/

Quantile statistics

Minimum1
5-th percentile137
Q1296
median435
Q3647
95-th percentile1275.4
Maximum6445
Range6444
Interquartile range (IQR)351

Descriptive statistics

Standard deviation421.3850701
Coefficient of variation (CV)0.7834321252
Kurtosis21.98557506
Mean537.8705525
Median Absolute Deviation (MAD)162
Skewness3.459546332
Sum10990309
Variance177565.3773
MonotonicityNot monotonic
2021-06-15T13:29:37.718126image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
28055
 
0.3%
33151
 
0.2%
34550
 
0.2%
34349
 
0.2%
39349
 
0.2%
39448
 
0.2%
34848
 
0.2%
32848
 
0.2%
27247
 
0.2%
30947
 
0.2%
Other values (1913)19941
96.6%
(Missing)207
 
1.0%
ValueCountFrequency (%)
11
 
< 0.1%
22
 
< 0.1%
35
< 0.1%
47
< 0.1%
56
< 0.1%
65
< 0.1%
76
< 0.1%
88
< 0.1%
97
< 0.1%
108
< 0.1%
ValueCountFrequency (%)
64451
< 0.1%
62101
< 0.1%
54711
< 0.1%
54191
< 0.1%
52901
< 0.1%
50331
< 0.1%
50271
< 0.1%
49571
< 0.1%
49521
< 0.1%
48191
< 0.1%

population
Real number (ℝ≥0)

HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION

Distinct3888
Distinct (%)18.8%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean1425.476744
Minimum3
Maximum35682
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size161.4 KiB
2021-06-15T13:29:37.853719image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/

Quantile statistics

Minimum3
5-th percentile348
Q1787
median1166
Q31725
95-th percentile3288
Maximum35682
Range35679
Interquartile range (IQR)938

Descriptive statistics

Standard deviation1132.462122
Coefficient of variation (CV)0.7944444737
Kurtosis73.55311639
Mean1425.476744
Median Absolute Deviation (MAD)440
Skewness4.935858227
Sum29421840
Variance1282470.457
MonotonicityNot monotonic
2021-06-15T13:29:37.991006image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
89125
 
0.1%
76124
 
0.1%
85024
 
0.1%
122724
 
0.1%
105224
 
0.1%
82523
 
0.1%
78222
 
0.1%
100522
 
0.1%
99922
 
0.1%
78121
 
0.1%
Other values (3878)20409
98.9%
ValueCountFrequency (%)
31
 
< 0.1%
51
 
< 0.1%
61
 
< 0.1%
84
< 0.1%
92
< 0.1%
111
 
< 0.1%
134
< 0.1%
143
< 0.1%
152
< 0.1%
172
< 0.1%
ValueCountFrequency (%)
356821
< 0.1%
285661
< 0.1%
163051
< 0.1%
161221
< 0.1%
155071
< 0.1%
150371
< 0.1%
132511
< 0.1%
128731
< 0.1%
124271
< 0.1%
122031
< 0.1%

households
Real number (ℝ≥0)

HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION

Distinct1815
Distinct (%)8.8%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean499.5396802
Minimum1
Maximum6082
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size161.4 KiB
2021-06-15T13:29:38.125608image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/

Quantile statistics

Minimum1
5-th percentile125
Q1280
median409
Q3605
95-th percentile1162
Maximum6082
Range6081
Interquartile range (IQR)325

Descriptive statistics

Standard deviation382.3297528
Coefficient of variation (CV)0.7653641301
Kurtosis22.05798806
Mean499.5396802
Median Absolute Deviation (MAD)151
Skewness3.410437712
Sum10310499
Variance146176.0399
MonotonicityNot monotonic
2021-06-15T13:29:38.261169image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
30657
 
0.3%
38656
 
0.3%
33556
 
0.3%
28255
 
0.3%
42954
 
0.3%
37553
 
0.3%
29751
 
0.2%
28451
 
0.2%
36250
 
0.2%
34050
 
0.2%
Other values (1805)20107
97.4%
ValueCountFrequency (%)
11
 
< 0.1%
23
 
< 0.1%
34
 
< 0.1%
44
 
< 0.1%
57
< 0.1%
65
< 0.1%
710
< 0.1%
88
< 0.1%
99
< 0.1%
107
< 0.1%
ValueCountFrequency (%)
60821
< 0.1%
53581
< 0.1%
51891
< 0.1%
50501
< 0.1%
49301
< 0.1%
48551
< 0.1%
47691
< 0.1%
46161
< 0.1%
44901
< 0.1%
43721
< 0.1%

median_income
Real number (ℝ≥0)

HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION

Distinct12928
Distinct (%)62.6%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean3.870671003
Minimum0.4999
Maximum15.0001
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size161.4 KiB
2021-06-15T13:29:38.393276image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/

Quantile statistics

Minimum0.4999
5-th percentile1.60057
Q12.5634
median3.5348
Q34.74325
95-th percentile7.300305
Maximum15.0001
Range14.5002
Interquartile range (IQR)2.17985

Descriptive statistics

Standard deviation1.899821718
Coefficient of variation (CV)0.4908249026
Kurtosis4.952524102
Mean3.870671003
Median Absolute Deviation (MAD)1.0642
Skewness1.646656702
Sum79890.6495
Variance3.60932256
MonotonicityNot monotonic
2021-06-15T13:29:38.533617image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
3.12549
 
0.2%
15.000149
 
0.2%
2.87546
 
0.2%
2.62544
 
0.2%
4.12544
 
0.2%
3.87541
 
0.2%
3.37538
 
0.2%
338
 
0.2%
437
 
0.2%
3.62537
 
0.2%
Other values (12918)20217
98.0%
ValueCountFrequency (%)
0.499912
0.1%
0.53610
< 0.1%
0.54951
 
< 0.1%
0.64331
 
< 0.1%
0.67751
 
< 0.1%
0.68251
 
< 0.1%
0.68311
 
< 0.1%
0.6961
 
< 0.1%
0.69911
 
< 0.1%
0.70071
 
< 0.1%
ValueCountFrequency (%)
15.000149
0.2%
152
 
< 0.1%
14.90091
 
< 0.1%
14.58331
 
< 0.1%
14.42191
 
< 0.1%
14.41131
 
< 0.1%
14.29591
 
< 0.1%
14.28671
 
< 0.1%
13.9471
 
< 0.1%
13.85561
 
< 0.1%

median_house_value
Real number (ℝ≥0)

HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION

Distinct3842
Distinct (%)18.6%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean206855.8169
Minimum14999
Maximum500001
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size161.4 KiB
2021-06-15T13:29:38.666227image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/

Quantile statistics

Minimum14999
5-th percentile66200
Q1119600
median179700
Q3264725
95-th percentile489810
Maximum500001
Range485002
Interquartile range (IQR)145125

Descriptive statistics

Standard deviation115395.6159
Coefficient of variation (CV)0.55785531
Kurtosis0.3278702429
Mean206855.8169
Median Absolute Deviation (MAD)68400
Skewness0.9777632739
Sum4269504061
Variance1.331614816 × 1010
MonotonicityNot monotonic
2021-06-15T13:29:38.980178image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
500001965
 
4.7%
137500122
 
0.6%
162500117
 
0.6%
112500103
 
0.5%
18750093
 
0.5%
22500092
 
0.4%
35000079
 
0.4%
8750078
 
0.4%
27500065
 
0.3%
15000064
 
0.3%
Other values (3832)18862
91.4%
ValueCountFrequency (%)
149994
< 0.1%
175001
 
< 0.1%
225004
< 0.1%
250001
 
< 0.1%
266001
 
< 0.1%
269001
 
< 0.1%
275001
 
< 0.1%
283001
 
< 0.1%
300002
< 0.1%
325004
< 0.1%
ValueCountFrequency (%)
500001965
4.7%
50000027
 
0.1%
4991001
 
< 0.1%
4990001
 
< 0.1%
4988001
 
< 0.1%
4987001
 
< 0.1%
4986001
 
< 0.1%
4984001
 
< 0.1%
4976001
 
< 0.1%
4974001
 
< 0.1%

ocean_proximity
Categorical

HIGH CORRELATION

Distinct5
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size161.4 KiB
<1H OCEAN
9136 
INLAND
6551 
NEAR OCEAN
2658 
NEAR BAY
2290 
ISLAND
 
5

Length

Max length10
Median length9
Mean length8.064922481
Min length6

Characters and Unicode

Total characters166460
Distinct characters16
Distinct categories4 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowNEAR BAY
2nd rowNEAR BAY
3rd rowNEAR BAY
4th rowNEAR BAY
5th rowNEAR BAY

Common Values

ValueCountFrequency (%)
<1H OCEAN9136
44.3%
INLAND6551
31.7%
NEAR OCEAN2658
 
12.9%
NEAR BAY2290
 
11.1%
ISLAND5
 
< 0.1%

Length

2021-06-15T13:29:39.226822image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
Histogram of lengths of the category

Pie chart

2021-06-15T13:29:39.328856image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
ValueCountFrequency (%)
ocean11794
34.0%
1h9136
26.3%
inland6551
18.9%
near4948
14.2%
bay2290
 
6.6%
island5
 
< 0.1%

Most occurring characters

ValueCountFrequency (%)
N29849
17.9%
A25588
15.4%
E16742
10.1%
14084
8.5%
O11794
 
7.1%
C11794
 
7.1%
<9136
 
5.5%
19136
 
5.5%
H9136
 
5.5%
I6556
 
3.9%
Other values (6)22645
13.6%

Most occurring categories

ValueCountFrequency (%)
Uppercase Letter134104
80.6%
Space Separator14084
 
8.5%
Math Symbol9136
 
5.5%
Decimal Number9136
 
5.5%

Most frequent character per category

Uppercase Letter
ValueCountFrequency (%)
N29849
22.3%
A25588
19.1%
E16742
12.5%
O11794
 
8.8%
C11794
 
8.8%
H9136
 
6.8%
I6556
 
4.9%
L6556
 
4.9%
D6556
 
4.9%
R4948
 
3.7%
Other values (3)4585
 
3.4%
Space Separator
ValueCountFrequency (%)
14084
100.0%
Math Symbol
ValueCountFrequency (%)
<9136
100.0%
Decimal Number
ValueCountFrequency (%)
19136
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin134104
80.6%
Common32356
 
19.4%

Most frequent character per script

Latin
ValueCountFrequency (%)
N29849
22.3%
A25588
19.1%
E16742
12.5%
O11794
 
8.8%
C11794
 
8.8%
H9136
 
6.8%
I6556
 
4.9%
L6556
 
4.9%
D6556
 
4.9%
R4948
 
3.7%
Other values (3)4585
 
3.4%
Common
ValueCountFrequency (%)
14084
43.5%
<9136
28.2%
19136
28.2%

Most occurring blocks

ValueCountFrequency (%)
ASCII166460
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
N29849
17.9%
A25588
15.4%
E16742
10.1%
14084
8.5%
O11794
 
7.1%
C11794
 
7.1%
<9136
 
5.5%
19136
 
5.5%
H9136
 
5.5%
I6556
 
3.9%
Other values (6)22645
13.6%

Interactions

2021-06-15T13:29:25.541880image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-06-15T13:29:25.681962image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-06-15T13:29:25.802648image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-06-15T13:29:25.928469image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-06-15T13:29:26.054516image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-06-15T13:29:26.174908image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-06-15T13:29:26.297861image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-06-15T13:29:26.417874image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-06-15T13:29:26.540900image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-06-15T13:29:26.660963image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-06-15T13:29:26.776615image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-06-15T13:29:26.892159image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-06-15T13:29:27.011214image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-06-15T13:29:27.132797image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-06-15T13:29:27.251826image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-06-15T13:29:27.371851image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-06-15T13:29:27.489878image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-06-15T13:29:27.608901image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-06-15T13:29:27.726914image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-06-15T13:29:27.847934image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-06-15T13:29:27.966965image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-06-15T13:29:28.211030image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-06-15T13:29:28.337036image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-06-15T13:29:28.463223image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-06-15T13:29:28.587250image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-06-15T13:29:28.710178image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-06-15T13:29:28.833702image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-06-15T13:29:28.954855image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-06-15T13:29:29.080810image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-06-15T13:29:29.203935image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-06-15T13:29:29.328679image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-06-15T13:29:29.457706image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-06-15T13:29:29.582649image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-06-15T13:29:29.711678image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-06-15T13:29:29.841068image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-06-15T13:29:29.968141image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-06-15T13:29:30.093574image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-06-15T13:29:30.214494image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-06-15T13:29:30.334520image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-06-15T13:29:30.456987image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-06-15T13:29:30.580014image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-06-15T13:29:30.702036image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-06-15T13:29:30.825350image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-06-15T13:29:30.947060image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-06-15T13:29:31.071477image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-06-15T13:29:31.193389image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-06-15T13:29:31.416085image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-06-15T13:29:31.538309image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-06-15T13:29:31.661476image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-06-15T13:29:31.789046image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-06-15T13:29:31.914954image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-06-15T13:29:32.038765image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-06-15T13:29:32.163380image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-06-15T13:29:32.288415image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-06-15T13:29:32.413599image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-06-15T13:29:32.534845image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-06-15T13:29:32.655023image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-06-15T13:29:32.776343image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-06-15T13:29:32.903391image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-06-15T13:29:33.028888image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-06-15T13:29:33.156606image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-06-15T13:29:33.281955image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-06-15T13:29:33.407002image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-06-15T13:29:33.528952image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-06-15T13:29:33.652961image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-06-15T13:29:33.777160image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-06-15T13:29:33.905286image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-06-15T13:29:34.032818image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-06-15T13:29:34.172844image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-06-15T13:29:34.312850image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-06-15T13:29:34.439820image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-06-15T13:29:34.568940image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-06-15T13:29:34.693107image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-06-15T13:29:34.816739image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-06-15T13:29:34.935771image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-06-15T13:29:35.056780image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-06-15T13:29:35.184779image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-06-15T13:29:35.432853image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-06-15T13:29:35.564189image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-06-15T13:29:35.692222image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-06-15T13:29:35.817289image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/

Correlations

2021-06-15T13:29:39.437204image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/

Pearson's r

The Pearson's correlation coefficient (r) is a measure of linear correlation between two variables. It's value lies between -1 and +1, -1 indicating total negative linear correlation, 0 indicating no linear correlation and 1 indicating total positive linear correlation. Furthermore, r is invariant under separate changes in location and scale of the two variables, implying that for a linear function the angle to the x-axis does not affect r.

To calculate r for two variables X and Y, one divides the covariance of X and Y by the product of their standard deviations.
2021-06-15T13:29:39.593597image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/

Spearman's ρ

The Spearman's rank correlation coefficient (ρ) is a measure of monotonic correlation between two variables, and is therefore better in catching nonlinear monotonic correlations than Pearson's r. It's value lies between -1 and +1, -1 indicating total negative monotonic correlation, 0 indicating no monotonic correlation and 1 indicating total positive monotonic correlation.

To calculate ρ for two variables X and Y, one divides the covariance of the rank variables of X and Y by the product of their standard deviations.
2021-06-15T13:29:39.746903image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/

Kendall's τ

Similarly to Spearman's rank correlation coefficient, the Kendall rank correlation coefficient (τ) measures ordinal association between two variables. It's value lies between -1 and +1, -1 indicating total negative correlation, 0 indicating no correlation and 1 indicating total positive correlation.

To calculate τ for two variables X and Y, one determines the number of concordant and discordant pairs of observations. τ is given by the number of concordant pairs minus the discordant pairs divided by the total number of pairs.
2021-06-15T13:29:39.903104image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/

Phik (φk)

Phik (φk) is a new and practical correlation coefficient that works consistently between categorical, ordinal and interval variables, captures non-linear dependency and reverts to the Pearson correlation coefficient in case of a bivariate normal input distribution. There is extensive documentation available here.

Missing values

2021-06-15T13:29:36.002534image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
A simple visualization of nullity by column.
2021-06-15T13:29:36.167604image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.
2021-06-15T13:29:36.324455image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
The dendrogram allows you to more fully correlate variable completion, revealing trends deeper than the pairwise ones visible in the correlation heatmap.

Sample

First rows

longitudelatitudehousing_median_agetotal_roomstotal_bedroomspopulationhouseholdsmedian_incomemedian_house_valueocean_proximity
0-122.2337.8841880129.03221268.3252452600NEAR BAY
1-122.2237.862170991106.0240111388.3014358500NEAR BAY
2-122.2437.85521467190.04961777.2574352100NEAR BAY
3-122.2537.85521274235.05582195.6431341300NEAR BAY
4-122.2537.85521627280.05652593.8462342200NEAR BAY
5-122.2537.8552919213.04131934.0368269700NEAR BAY
6-122.2537.84522535489.010945143.6591299200NEAR BAY
7-122.2537.84523104687.011576473.1200241400NEAR BAY
8-122.2637.84422555665.012065952.0804226700NEAR BAY
9-122.2537.84523549707.015517143.6912261100NEAR BAY

Last rows

longitudelatitudehousing_median_agetotal_roomstotal_bedroomspopulationhouseholdsmedian_incomemedian_house_valueocean_proximity
20630-121.3239.29112640505.012574453.5673112000INLAND
20631-121.4039.33152655493.012004323.5179107200INLAND
20632-121.4539.26152319416.010473853.1250115600INLAND
20633-121.5339.19272080412.010823822.549598300INLAND
20634-121.5639.27282332395.010413443.7125116800INLAND
20635-121.0939.48251665374.08453301.560378100INLAND
20636-121.2139.4918697150.03561142.556877100INLAND
20637-121.2239.43172254485.010074331.700092300INLAND
20638-121.3239.43181860409.07413491.867284700INLAND
20639-121.2439.37162785616.013875302.388689400INLAND